Predicting Human Perceived Accuracy of ASR Systems
نویسندگان
چکیده
Word error rate (WER), which is the most commonly used method of measuring automatic speech recognition (ASR) accuracy, penalizes all types of ASR errors equally. However, humans differentially weigh different types of ASR errors. They judge ASR errors that distort the meaning of the spoken message more harshly than those that do not. Aiming to align more closely with human perception of ASR accuracy, we developed a new metric HPA (Human Perceived Accuracy) that predicts the subjective perceived accuracy of ASR transcriptions. HPA is computed based on the central idea of differential weighting of different ASR errors. Applied to the particular task of automatically recognizing voicemails, we found that the correlation between HPA and the human judgement of ASR accuracy was significantly higher (r-value=0.91) than the correlation between WER and human judgement (r-value=0.65).
منابع مشابه
Predicting Barge-in Utterance Errors by using Implicitly-Supervised ASR Accuracy and Barge-in Rate per User
Modeling of individual users is a promising way of improving the performance of spoken dialogue systems deployed for the general public and utilized repeatedly. We define “implicitly-supervised” ASR accuracy per user on the basis of responses following the system’s explicit confirmations. We combine the estimated ASR accuracy with the user’s barge-in rate, which represents how well the user is ...
متن کاملConcept Form Adaptation in Human-Computer Dialog
In this work we examine user adaptation to a dialog system’s choice of realization of task-related concepts. We analyze forms of the time concept in the Let’s Go! spoken dialog system. We find that users adapt to the system’s choice of time form. We also find that user adaptation is affected by perceived system adaptation. This means that dialog systems can guide users’ word choice and can adap...
متن کاملPredicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems
We exploit the barge-in rate of individual users to predict automatic speech recognition (ASR) errors. A barge-in is a situation in which a user starts speaking during a system prompt, and it can be detected even when ASR results are not reliable. Such features not using ASR results can be a clue for managing a situation in which user utterances cannot be successfully recognized. Since individu...
متن کاملFactors that influence the performance of experienced speech recognition users.
Performance on automatic speech recognition (ASR) systems for users with physical disabilities varies widely between individuals. The goal of this study was to discover some key factors that account for that variation. Using data from 23 experienced ASR users with physical disabilities, the effect of 20 different independent variables on recognition accuracy and text entry rate with ASR was mea...
متن کاملDNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring
When applying automated speech-scoring technology to the rating of globally administered real assessments, there are several practical challenges: (a) ASR accuracy on non-native spontaneous speech is generally low; (b) due to the data mismatch between an ASR systems training stage and its final usage, the recognition accuracy obtained in practice is even lower; (c) content-relevance was not wid...
متن کامل